Explication du problème :

De nos jours, les applications s’adaptent aux besoins de ces utilisateurs ergonomie, performances et faisabilités.. Pour garantir la fidélité des consommateurs. # Présentation de la solution proposée : Dans le cadre de notre Tp nous disposant d’un event log d’une application de paiement bancaire dans le but d’identifier le processus le plus frequament utilisé au cour de l’utilisation de l’application

Détails techniques de la méthode:

Process Mining rend l’analyse de processus à nouveau pertinente. Process Mining utilise les données générées des systèmes. Il peut générer automatiquement des modèles de processus réels avec des fréquences et des mesures de performance . De plus, les modèles de processus permettent d’identifier facilement tous les problèmes de conformité à la fois. # Résultat :

library(bupaR)
## Warning: package 'bupaR' was built under R version 3.5.2
## Loading required package: edeaR
## Warning: package 'edeaR' was built under R version 3.5.2
## Loading required package: eventdataR
## Warning: package 'eventdataR' was built under R version 3.5.2
## Loading required package: processmapR
## Warning: package 'processmapR' was built under R version 3.5.2
## Loading required package: xesreadR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'xesreadR'
## Loading required package: processmonitR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'processmonitR'
## Loading required package: petrinetR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'petrinetR'
## 
## Attaching package: 'bupaR'
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:utils':
## 
##     timestamp
library(edeaR)
library(processmapR)
library(eventdataR)
#install.packages('tidyverse')
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages ------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.7
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.3.1     v forcats 0.3.0
## Warning: package 'readr' was built under R version 3.5.2
## -- Conflicts --------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks bupaR::filter(), stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readr)
library(tidyverse)
library(DiagrammeR)
## Warning: package 'DiagrammeR' was built under R version 3.5.2
library(ggplot2)
library(stringr)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
credit<-read.csv(file=file.choose())
echant<-credit[1:1000,]

Transformer les données eventlog

library("bupaR")
credit$starttimestamp = as.POSIXct(credit$Start_Timestamp,tz="" ,
                               format = "%Y/%m/%d %H:%M:%S.%OS")

credit$endtimestamp = as.POSIXct(credit$Complete_Timestamp,
                             format = "%Y/%m/%d %H:%M:%S.%OS")
# remove blanks from var names
names(credit) <- str_replace_all(names(credit), c(" " = "_" , "," = "" ))

events <- bupaR::activities_to_eventlog(
credit,
case_id = 'Case_ID',
activity_id = 'Activity',
resource_id = 'Resource',
timestamps = c('starttimestamp', 'endtimestamp'))

Aperçu des Eventlog (summary)

summary(events)
## Number of events:  800000
## Number of cases:  57165
## Number of traces:  5256
## Number of distinct activities:  26
## Average trace length:  13.99458
## 
## Start eventlog:  NA
## End eventlog:  NA
##    Case_ID                              Activity         Resource     
##  Length:800000                              : 73546   User_1 : 98752  
##  Class :character   O_Create Offer          : 56048          : 73546  
##  Mode  :character   O_Created               : 56048   User_49: 15556  
##                     O_Sent (mail and online): 51876   User_29: 13864  
##                     W_Validate application  : 50960   User_3 : 13610  
##                     A_Validating            : 50026   User_10: 13388  
##                     (Other)                 :461496   (Other):571284  
##                 Start_Timestamp                 Complete_Timestamp
##                         : 73546                          : 73546  
##  2016/01/08 19:56:43.212:     4   2016/01/08 19:56:43.212:     4  
##  2016/01/29 09:10:58.778:     4   2016/01/29 09:10:58.778:     4  
##  2016/03/02 15:15:40.745:     4   2016/03/02 15:15:40.745:     4  
##  2016/07/11 15:17:14.450:     4   2016/06/16 13:04:35.404:     4  
##  2016/07/15 13:09:09.433:     4   2016/07/11 15:17:14.450:     4  
##  (Other)                :726434   (Other)                :726434  
##       Variant       Variant_index    X.case._ApplicationType
##           : 73546   Min.   :   1.0              : 73546     
##  Variant 1: 55176   1st Qu.:   7.0   Limit raise: 72788     
##  Variant 2: 39140   Median :  25.0   New credit :653666     
##  Variant 3: 29370   Mean   : 390.3                          
##  Variant 5: 21756   3rd Qu.: 257.0                          
##  Variant 8: 20016   Max.   :3159.0                          
##  (Other)  :560996   NA's   :73546                           
##                 X.case._creditGoal X.case._RequestedAmount  Accepted     
##  Car                     :233912   Min.   :     0               : 73546  
##  Home improvement        :190592   1st Qu.:  6000          false: 16718  
##  Existing credit takeover:142022   Median : 12000          true : 39330  
##                          : 73546   Mean   : 15618          NA's :670406  
##  Unknown                 : 64138   3rd Qu.: 20000                        
##  Not speficied           : 28408   Max.   :450000                        
##  (Other)                 : 67382   NA's   :73546                         
##          Action        CreditScore                       EventID      
##             : 73546   Min.   :   0.0                         : 73546  
##  Created    : 96832   1st Qu.:   0.0   Application_1000158214:     2  
##  Deleted    : 55582   Median :   0.0   Application_1000311556:     2  
##  Obtained   :109138   Mean   : 319.9   Application_1000339879:     2  
##  statechange:464902   3rd Qu.: 851.0   Application_100034150 :     2  
##                       Max.   :1142.0   Application_1000557783:     2  
##                       NA's   :743952   (Other)               :726444  
##       EventOrigin     FirstWithdrawalAmount  MonthlyCost    
##             : 73546   Min.   :    0         Min.   :  43.0  
##  Application:308920   1st Qu.:    0         1st Qu.: 150.0  
##  Offer      :252814   Median : 5000         Median : 232.8  
##  Workflow   :164720   Mean   : 7681         Mean   : 273.9  
##                       3rd Qu.:10996         3rd Qu.: 340.8  
##                       Max.   :75000         Max.   :6673.8  
##                       NA's   :743952        NA's   :743952  
##  NumberOfTerms                OfferID       OfferedAmount   
##  Min.   :  5                      : 73546   Min.   : 5000   
##  1st Qu.: 56      Offer_1000226917:     8   1st Qu.: 8000   
##  Median : 74      Offer_1000329580:     8   Median :15000   
##  Mean   : 82      Offer_1000373613:     8   Mean   :17820   
##  3rd Qu.:120      Offer_1000572979:     8   3rd Qu.:24000   
##  Max.   :180      (Other)         :196734   Max.   :75000   
##  NA's   :743952   NA's            :529688   NA's   :743952  
##   Selected      lifecycle.transition activity_instance_id
##       : 73546           : 73546      Length:800000       
##  false: 27830   complete:617316      Class :character    
##  true : 28218   start   :109138      Mode  :character    
##  NA's :670406                                            
##                                                          
##                                                          
##                                                          
##          lifecycle_id      timestamp                       .order     
##  endtimestamp  :400000   Min.   :2016-01-01 10:51:15   Min.   :1e+00  
##  starttimestamp:400000   1st Qu.:2016-03-19 16:12:32   1st Qu.:2e+05  
##                          Median :2016-06-07 10:30:24   Median :4e+05  
##                          Mean   :2016-05-28 15:59:38   Mean   :4e+05  
##                          3rd Qu.:2016-08-02 20:04:28   3rd Qu.:6e+05  
##                          Max.   :2017-01-26 10:11:10   Max.   :8e+05  
##                          NA's   :74194

-> on a 800000 events, 57165 cas possible et 5256 traces. ### Fréquence d’activités

events %>% 
  activity_frequency(level = "activity") 
## # A tibble: 26 x 3
##    Activity             absolute relative
##    <fct>                   <int>    <dbl>
##  1 ""                      36773  0.0919 
##  2 A_Accepted              20392  0.0510 
##  3 A_Cancelled              6798  0.0170 
##  4 A_Complete              20311  0.0508 
##  5 A_Concept               20392  0.0510 
##  6 A_Create Application    20392  0.0510 
##  7 A_Denied                 2319  0.00580
##  8 A_Incomplete            14335  0.0358 
##  9 A_Pending               11275  0.0282 
## 10 A_Submitted             13233  0.0331 
## # ... with 16 more rows
events %>% 
  activity_frequency(level = "activity") %>% 
  plot()

-> Les activitées les plus fréquentes O_Created , O_Create offer , O_sent(mail an online)

les processus où une activité doit être présente

events %>% 
  filter_activity_presence(activities = c('A_Cancelled')) %>% 
  activity_frequency(level = "activity") 
## # A tibble: 21 x 3
##    Activity             absolute relative
##    <fct>                   <int>    <dbl>
##  1 A_Accepted               6798  0.0724 
##  2 A_Cancelled              6798  0.0724 
##  3 A_Complete               6738  0.0718 
##  4 A_Concept                6798  0.0724 
##  5 A_Create Application     6798  0.0724 
##  6 A_Incomplete              933  0.00994
##  7 A_Submitted              4939  0.0526 
##  8 A_Validating              997  0.0106 
##  9 O_Cancelled              9121  0.0972 
## 10 O_Create Offer           9121  0.0972 
## # ... with 11 more rows
plt<-events %>% 
  filter_activity_presence(activities = c('A_Cancelled')) %>% 
  activity_frequency(level = "activity") 
  plot(plt)

-> Les activités les plus présentes : O_Created O_Create offre , O_Cancelled ### Le graphe de la carte de processus

library(DiagrammeRsvg)
## Warning: package 'DiagrammeRsvg' was built under R version 3.5.2
library(rsvg)
## Warning: package 'rsvg' was built under R version 3.5.2
events %>%
  filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
  filter_trace_frequency(percentage = .80) %>%    # show only the most frequent traces
  process_map(render = T)
## Warning: Prefixing `UQ()` with the rlang namespace is deprecated as of rlang 0.3.0.
## Please use the non-prefixed form or `!!` instead.
## 
##   # Bad:
##   rlang::expr(mean(rlang::UQ(var) * 100))
## 
##   # Ok:
##   rlang::expr(mean(UQ(var) * 100))
## 
##   # Good:
##   rlang::expr(mean(!!var * 100))
## 
## This warning is displayed once per session.

On constate une liaison entre les activités qui nous mène a identifier le processus le plus utilisé par le client.

library(DiagrammeRsvg)
library(rsvg)
events %>%
  filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
  filter_trace_frequency(percentage = .80) %>%    # show only the most frequent traces
  process_map(performance(mean, "mins"),
              render = T) 

Matrice de la fréquence des suiveur d’activités

# precedent matrix ####
precedence_matrix <- events %>%
  filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
  filter_trace_frequency(percentage = .80) %>%    # show only the most frequent traces
  precedence_matrix() %>% 
  plot()
plot(precedence_matrix)

Une autre methodes pour identitfier les activités et opérations qui son utilisé les uns aprés les autres ## Trace d’activité

# trace explorer
trace_explorer <- events %>%
  trace_explorer(coverage = 0.7)
plot(trace_explorer)

events %>%
  filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
  filter_trace_frequency(percentage = .80) %>%    # show only the most frequent traces
  group_by(X.case._ApplicationType) %>% 
  throughput_time('log', units = 'hours')
## # A tibble: 3 x 10
##   X.case._Applica~     min    q1 median  mean    q3   max st_dev   iqr
##   <fct>              <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
## 1 New credit        0.110   281.   526.  535.  762. 1837.     4    NA 
## 2 ""               NA        NA     NA   NaN    NA    NA  36773    NA 
## 3 Limit raise       0.0597  219.   326.  403.  593. 1197.   235.  374.
## # ... with 1 more variable: NA. <dbl>

L’activité la plus commune lors de l’utilisation de l’application est l’extraction d’un crédit bancaire

plott<-events %>%
  filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
  filter_trace_frequency(percentage = .80) %>%    # show only the most frequent traces
  group_by(X.case._ApplicationType) %>% 
  throughput_time('log', units = 'hours')

plot(plott)
## Warning: Removed 36777 rows containing non-finite values (stat_boxplot).

events %>%
  filter_trace_frequency(percentage = .80) %>%    # show only the most frequent traces
  group_by(X.case._creditGoal) %>% 
throughput_time('log', units = 'hours')
## # A tibble: 14 x 10
##    X.case._creditG~      min    q1 median  mean    q3   max st_dev   iqr
##    <fct>               <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
##  1 Existing credit~   1.19    305.   529.  548.  763. 1697.     2    NA 
##  2 Home improvement   0.178   290.   500.  531.  762. 1837.     1    NA 
##  3 Car                0.144   242.   450.  506.  762. 1828.     1    NA 
##  4 ""                NA        NA     NA   NaN    NA    NA  36773    NA 
##  5 Not speficied      0.595   334.   733.  581.  779. 1442.   279.  446.
##  6 Caravan / Camper  56.2     213.   355.  450.  745. 1334.   273.  532.
##  7 Unknown            0.0597  232.   338.  432.  734. 1314.   248.  501.
##  8 Motorcycle        57.8     230.   438.  499.  764. 1338.   285.  534.
##  9 Business goal    233.      310.   660.  553.  756.  851.   245.  446.
## 10 Boat              55.0     267.   428.  517.  755. 1378.   282.  488.
## 11 Extra spending ~  43.2     265.   443.  503.  756. 1188.   262.  491.
## 12 Remaining debt ~   0.62    345.   735.  617.  782. 1669.   317.  437.
## 13 Tax payments      51.4     288.   432.  490.  742. 1165.   266.  455.
## 14 Debt restructur~ 732.      732.   732.  732.  732.  732.    NA     0 
## # ... with 1 more variable: NA. <dbl>

On constate que les crédits on servie essentiellement a l’améloration d’une maison ou l’achat d’une voiture ou payement d’un crédit pour maison .

plot2<-events %>%
  filter_trace_frequency(percentage = .80) %>%    # show only the most frequent traces
  group_by(X.case._creditGoal) %>% 
throughput_time('log', units = 'hours')
plot(plot2)
## Warning: Removed 36777 rows containing non-finite values (stat_boxplot).

## Interprétations: